Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

ABanditNAS: Anti-Bandit for Neural Architecture Search

According to Eq. 4.6, the variance in the anti-bandit algorithm is bounded, and the

lower/upper conﬁdence bounds can be estimated as

˜rk −

2 ln N

≤rk ≤˜rk +

2 ln N

(4.8)

4.2.2

Search Space

Following [307, 151, 291], we search for computation cells as the building blocks of the

ﬁnal architecture. A cell is a fully connected directed acyclic graph (DAG) of M nodes, i.e.,

{B1, B2, ..., BM} as shown in Fig. 4.13. Here, each node is a speciﬁc tensor (e.g., a feature

map in convolutional neural networks), and each directed edge (i, j) between Bi and Bj

denotes an operation o⁽^i,j⁾(.), which is sampled from Ω⁽^i,j⁾= {o⁽^i,j⁾

, ..., o⁽^i,j⁾

}. {Ω⁽^i,j⁾} is the

search space of a cell. Each node Bj takes its dependent nodes as input and can be obtained

by Bj = Σi<jo⁽^i,j⁾(Bi). The constraint i < j here is to avoid cycles in a cell. Each cell takes

the output of the last cell as input. For brevity, we denote by B0 the last node of the

previous cell and the ﬁrst node of the current cell. Unlike existing approaches that use only

normal and reduction cells, we search for v (v > 2) cells instead. For general NAS search, we

follow [151] and take seven normal operations, i.e., 3×3 max pooling, 3×3 average pooling,

skip connection (identity), 3×3 convolution with rate 2, 5×5 convolution with rate 2, 3×3

depth-wise separable convolution, and 5 × 5 depth-wise separable convolution. Considering

adversarially robust optimization for NAS, we introduce two additional operations, the 3×3

Gabor ﬁlter and denoising block, for model defense. Therefore, the size of the entire search

space is K^|E^M^|×^v, where EM is the set of possible edges with M intermediate nodes in the

fully connected DAG. In the case with M = 4 and v = 6, together with the input node, the

total number of cell structures in the search space is 9^(1+2+3+4)^×⁶= 9¹⁰^×⁶. Here, we brieﬂy

introduce the two additional operations.

Gabor ﬁlter. Gabor ﬁlters [69, 68] containing frequency and orientation representations

can characterize the spatial frequency structure in images while preserving spatial relation-

ships. This operation provides superb robustness for the network [191]. Gabor ﬁlters are de-

ﬁned as: exp(−^x^′²⁺^γ²^y^′²

2σ²

) cos(2π ^x^′

λ ⁺^ψ^{). Here,}^x^′⁼^x^cos^θ⁺^y^sin^θ^and^y^′⁼⁻^x^sin^θ⁺^y^cos^θ^.

σ, γ, λ, ψ, and θ are learnable parameters. Note that the symbols used here apply only

to the Gabor ﬁlter and are diﬀerent from the symbols used in the rest of this chapter.

Figure 4.2(b) shows an example of Gabor ﬁlters.

Denoising block. As described in [253], adversarial perturbations on images will intro-

duce noise in the features. Therefore, denoising blocks can improve adversarial robustness by

denoising features. Following this, we add the nonlocal mean denoising block [22] as shown

in Fig. 4.2(c) to the search space to denoise the features. Calculate a denoised feature map

z of an input feature map x by taking a weighted mean of the spatial locations of the fea-

tures in general L as zp =

C(x)

∀q∈L ^f⁽^x^p^{, x}^q⁾^·^x^q^{, where}^f⁽^x^p^{, x}^q^{) is a feature-dependent}

weighting function and C(x) is a normalization function.

4.2.3

Anti-Bandit Strategy for NAS

As described in [274, 291], the validation accuracy ranking of diﬀerent network architectures

is not a reliable indicator of the ﬁnal quality of the architecture. However, the experimental

results suggest that if an architecture performs poorly at the beginning of training, there

is little hope that it can be part of the ﬁnal optimal model [291]. As training progresses,

this observation becomes more and more speciﬁc. On the basis of this observation, we

derive a simple but eﬀective training strategy. During training and the increasing epochs,